-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use encoded type info to deduplicate extra data #381
Use encoded type info to deduplicate extra data #381
Conversation
Currently, we use TypeInfo.Identifier() to deduplicate extra data and type info. However, TypeInfo.Identifier() is implemented in another package and we can't enforce its uniqueness for different types. If TypeInfo.Identifier() returns same ID for different types, different types is wrongly deduplicated. This commit uses encoded type info via TypeInfo.Encode() to deduplicate extra data. This prevents differently encoded type info from being deduplicated by mistake. This commit also uses sync.Pool to reuse buffer for type info encoding.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/array-map-inlining #381 +/- ##
==============================================================
- Coverage 63.18% 63.10% -0.08%
==============================================================
Files 15 15
Lines 10955 10983 +28
==============================================================
+ Hits 6922 6931 +9
- Misses 3054 3064 +10
- Partials 979 988 +9 ☔ View full report in Codecov by Sentry. |
@turbolent @ramtinms PTAL 🙏 With this PR, full atree migration with Cadence 0.42 + atree inlining/deduplication + validation passed yesterday using
BTW, to avoid blocking team during 4+ hour test runs, I created and am using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Good idea to just use the encoded type info itself 👍
aID, _ := getEncodedTypeInfo(a) | ||
bID, _ := getEncodedTypeInfo(b) | ||
return aID == bID | ||
} | ||
|
||
func getEncodedTypeInfo(ti atree.TypeInfo) (string, error) { | ||
b := getTypeIDBuffer() | ||
defer putTypeIDBuffer(b) | ||
|
||
enc := cbor.NewStreamEncoder(b) | ||
err := ti.Encode(enc) | ||
if err != nil { | ||
return "", err | ||
} | ||
enc.Flush() | ||
|
||
return b.String(), nil | ||
} | ||
|
||
const defaultTypeIDBufferSize = 256 | ||
|
||
var typeIDBufferPool = sync.Pool{ | ||
New: func() interface{} { | ||
e := new(bytes.Buffer) | ||
e.Grow(defaultTypeIDBufferSize) | ||
return e | ||
}, | ||
} | ||
|
||
func getTypeIDBuffer() *bytes.Buffer { | ||
return typeIDBufferPool.Get().(*bytes.Buffer) | ||
} | ||
|
||
func putTypeIDBuffer(e *bytes.Buffer) { | ||
e.Reset() | ||
typeIDBufferPool.Put(e) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this code is a duplicate of the code in typeinfo.go
. Maybe the same code can be reused here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I don't like duplicate code too, but in this PR, I intentionally wrote same code in 2 packages within atree (atree
and stress test main
) in order to avoid exporting them and having to support their use by 3rd parties.
@fxamacker That's really great news, well done! 👏 🎉 Good idea to spin up a separate machine to avoid blocking 👍 |
This handles edge case for Cadence
type.ID()
returning same ID for different types.Currently, we use
TypeInfo.Identifier()
to deduplicate extra data and type info. However,TypeInfo.Identifier()
is implemented in another package and we can't enforce its uniqueness for different types. IfTypeInfo.Identifier()
returns same ID for different types, different types is wrongly deduplicated.This PR uses encoded type info via
TypeInfo.Encode()
to deduplicate extra data. This prevents differently encoded type info from being deduplicated by mistake.This PR also uses
sync.Pool
to reuse buffer for type info encoding.This problem was found by using latest testnet data for atree migration + Cadence v0.42. Prior tests using mainnet data did not encounter this issue.
main
branchFiles changed
in the Github PR explorer